applied science
CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis
Zhang, Xinyu, Zhang, Pei, Luo, Shuang, Tang, Jialong, Wan, Yu, Yang, Baosong, Huang, Fei
Cultural competence, defined as the ability to understand and adapt to multicultural contexts, is increasingly vital for large language models (LLMs) in global environments. While several cultural benchmarks exist to assess LLMs' cultural competence, current evaluations suffer from fragmented taxonomies, domain specificity, and heavy reliance on manual data annotation. To address these limitations, we introduce CultureSynth, a novel framework comprising (1) a comprehensive hierarchical multilingual cultural taxonomy covering 12 primary and 130 secondary topics, and (2) a Retrieval-Augmented Generation (RAG)-based methodology leveraging factual knowledge to synthesize culturally relevant question-answer pairs. The CultureSynth-7 synthetic benchmark contains 19,360 entries and 4,149 manually verified entries across 7 languages. Evaluation of 14 prevalent LLMs of different sizes reveals clear performance stratification led by ChatGPT-4o-Latest and Qwen2.5-72B-Instruct. The results demonstrate that a 3B-parameter threshold is necessary for achieving basic cultural competence, models display varying architectural biases in knowledge processing, and significant geographic disparities exist across models. We believe that CultureSynth offers a scalable framework for developing culturally aware AI systems while reducing reliance on manual annotation\footnote{Benchmark is available at https://github.com/Eyr3/CultureSynth.}.
Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving
Englmeier, Stefan, Büttner, Max A., Winter, Katharina, Flohr, Fabian B.
--Autonomous driving systems must operate reliably in safety-critical scenarios, particularly those involving unusual or complex behavior by V ulnerable Road Users (VRUs). Identifying these edge cases in driving datasets is essential for robust evaluation and generalization, but retrieving such rare human behavior scenarios within the long tail of large-scale datasets is challenging. T o support targeted evaluation of autonomous driving systems in diverse, human-centered scenarios, we propose a novel context-aware motion retrieval framework. Our method combines Skinned Multi-Person Linear (SMPL)-based motion sequences and corresponding video frames before encoding them into a shared multimodal embedding space aligned with natural language. Our approach enables the scalable retrieval of human behavior and their context through text queries. This work also introduces our dataset WayMoCo, an extension of the Waymo Open Dataset. It contains automatically labeled motion and scene context descriptions derived from generated pseudo-ground-truth SMPL sequences and corresponding image data. Our approach outperforms state-of-the-art models by up to 27.5% accuracy in motion-context retrieval, when evaluated on the WayMoCo dataset. Training and evaluating machine learning models for autonomous driving heavily depends on the availability of large-scale, high-quality data.
Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology
Felde, Sabine, Buchkremer, Rüdiger, Chehab, Gamal, Thielscher, Christian, Distler, Jörg HW, Schneider, Matthias, Richter, Jutta G.
Large language models (LLMs) show promise for supporting clinical decision-making in complex fields such as rheumatology. Our evaluation shows that smaller language models (SLMs), combined with retrieval-augmented generation (RAG), achieve higher diagnostic and therapeutic performance than larger models, while requiring substantially less energy and enabling cost-efficient, local deployment. These features are attractive for resource-limited healthcare. However, expert oversight remains essential, as no model consistently reached specialist-level accuracy in rheumatology.
OAK -- Onboarding with Actionable Knowledge
Devènes, Steve, Capallera, Marine, Cherix, Robin, Mugellini, Elena, Khaled, Omar Abou, Carrino, Francesco
The loss of knowledge when skilled operators leave poses a critical issue for companies. This know-how is diverse and unstructured. We propose a novel method that combines knowledge graph embeddings and multi-modal interfaces to collect and retrieve expertise, making it actionable. Our approach supports decision-making on the shop floor. Additionally, we leverage LLMs to improve query understanding and provide adapted answers. As application case studies, we developed a proof-of-concept for quality control in high precision manufacturing.
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation
Tran, Khanh-Tung, O'Sullivan, Barry, Nguyen, Hoang D.
Recent advances in Large Language Models (LLMs) have demonstrated promising knowledge and reasoning abilities, yet their performance in multilingual and low-resource settings remains underexplored. Existing benchmarks often exhibit cultural bias, restrict evaluation to text-only, rely on multiple-choice formats, and, more importantly, are limited for extremely low-resource languages. To address these gaps, we introduce IRLBench, presented in parallel English and Irish, which is considered definitely endangered by UNESCO. Our benchmark consists of 12 representative subjects developed from the 2024 Irish Leaving Certificate exams, enabling fine-grained analysis of model capabilities across domains. By framing the task as long-form generation and leveraging the official marking scheme, it does not only support a comprehensive evaluation of correctness but also language fidelity. Our extensive experiments of leading closed-source and open-source LLMs reveal a persistent performance gap between English and Irish, in which models produce valid Irish responses less than 80\% of the time, and answer correctly 55.8\% of the time compared to 76.2\% in English for the best-performing model. We release IRLBench (https://huggingface.co/datasets/ReliableAI/IRLBench) and an accompanying evaluation codebase (https://github.com/ReML-AI/IRLBench) to enable future research on robust, culturally aware multilingual AI development.
From PDFs to Structured Data: Utilizing LLM Analysis in Sports Database Management
This study investigates the effectiveness of Large Language Models (LLMs) in processing semi-structured data from PDF documents into structured formats, specifically examining their application in updating the Finnish Sports Clubs Database. Through action research methodology, we developed and evaluated an AI-assisted approach utilizing OpenAI's GPT-4 and Anthropic's Claude 3 Opus models to process data from 72 sports federation membership reports. The system achieved a 90% success rate in automated processing, successfully handling 65 of 72 files without errors and converting over 7,900 rows of data. While the initial development time was comparable to traditional manual processing (three months), the implemented system shows potential for reducing future processing time by approximately 90%. Key challenges included handling multilingual content, processing multi-page datasets, and managing extraneous information. The findings suggest that while LLMs demonstrate significant potential for automating semi-structured data processing tasks, optimal results are achieved through a hybrid approach combining AI automation with selective human oversight. This research contributes to the growing body of literature on practical LLM applications in organizational data management and provides insights into the transformation of traditional data processing workflows.
Impacts of Anthropomorphizing Large Language Models in Learning Environments
Schaaff, Kristina, Heidelmann, Marc-André
Similarly to the factors of anthropomorphism summarized by [11], we identified the following factors as relevant when Large Language Models (LLMs) are increasingly being used LLM-based chatbots are used in learning scenarios: The learning in learning environments to support teaching--be it as learning agent, i.e., chatbot, the learner itself, and environmental companions or as tutors [1]-[3]. With our contribution, we factors which influence the learner (see Figure 1). According to the media equation [4], people tend to respond to media in the same way as they would respond to another person. A study conducted by the Georgia Institute of Technology showed that chatbots can be successfully implemented in learning environments. As LLM-based chatbots such as OpenAI's GPT Looking at the agent, several factors can contribute to series are increasingly used in educational tools, it is important anthropomorphization. Cognitive intelligence refers to the to understand how the attribution processes to LLM-based ability to perceive, reason, and act on problems; to combine chatbots in terms of anthropomorphization affect learners' efficient, useful, goal-oriented, and autonomous actions with emotions.
Pneumonia Detection on chest X-ray images Using Ensemble of Deep Convolutional Neural Networks
Mabrouk, Alhassan, Redondo, Rebeca P. Díaz, Dahou, Abdelghani, Elaziz, Mohamed Abd, Kayed, Mohammed
neumonia is a life-threatening lung infection resulting from several different viral infections. Identifying and treating pneumonia on chest X-ray images can be difficult due to its similarity to other pulmonary diseases. Thus, the existing methods for predicting pneumonia cannot attain substantial levels of accuracy. Therefore, this paper presents a computer-aided classification of pneumonia, coined as Ensemble Learning (EL), to simplify the diagnosis process on chest X-ray images. Our proposal is based on Convolutional Neural Network (CNN) models, which are pre-trained CNN models that have been recently employed to enhance the performance of many medical tasks instead of training CNN models from scratch. We propose to use three well-known CNN pre-trained (DenseNet169, MobileNetV2 and Vision Transformer) using the ImageNet database. Then, these models are trained on the chest X-ray data set using fine-tuning. Finally, the results are obtained by combining the extracted features from these three models during the experimental phase. The proposed EL approach outperforms other existing state-of-the-art methods, and it obtains an accuracy of 93.91% and a F1-Score of 93.88% on the testing phase. Identifying and treating pneumonia on chest X-ray images can be difficult due to its similarity to other pulmonary diseases. Thus, the existing methods for predicting pneumonia cannot attain substantial levels of accuracy. Therefore, this paper presents a computer-aided classification of pneumonia, coined as Ensemble Learning (EL), to simplify the diagnosis process on chest X-ray images. Our proposal is based on Convolutional Neural Network (CNN) models, which are pretrained CNN models that have been recently employed to enhance the performance of many medical tasks instead of training CNN models from scratch.
Modelling and simulation of a commercially available dielectric elastomer actuator
Sohlbach, Lukas, Hobbani, Hamza, Blase, Chistopher, Perez-Peña, Fernando, Schmidt, Karsten
In order to fully harness the potential of dielectric elastomer actu-ators (DEAs) in soft robots, advanced control methods are need-ed. An important groundwork for this is the development of a control-oriented model that can adequately describe the underly-ing dynamics of a DEA. A common feature of existing models is that always custom-made DEAs were investigated. This makes the modelling process easier, as all specifications and the struc-ture of the actuator are well known. In the case of a commercial actuator, however, only the information from the manufacturer is available and must be checked or completed during the modelling process. The aim of this paper is to explore how a commercial stacked silicone-based DEA can be modelled and how complex the model should be to properly replicate the features of the actu-ator. The static description has demonstrated the suitability of Hooke's law. In the case of dynamic description, it is shown that no viscoelastic model is needed for control-oriented modelling. However, if all features of the DEA are considered, the general-ized Kelvin-Maxwell model with three Maxwell elements shows good results, stability and computational efficiency.
Extending Cobot's Motion Intention Visualization by Haptic Feedback
Pascher, Max, Franzen, Til, Kronhardt, Kirill, Gerken, Jens
Nowadays, robots are found in a growing number of areas where they collaborate closely with humans. Enabled by lightweight materials and safety sensors, these cobots are gaining increasing popularity in domestic care, supporting people with physical impairments in their everyday lives. However, when cobots perform actions autonomously, it remains challenging for human collaborators to understand and predict their behavior, which is crucial for achieving trust and user acceptance. One significant aspect of predicting cobot behavior is understanding their motion intention and comprehending how they "think" about their actions. Moreover, other information sources often occupy human visual and audio modalities, rendering them frequently unsuitable for transmitting such information. We work on a solution that communicates cobot intention via haptic feedback to tackle this challenge. In our concept, we map planned motions of the cobot to different haptic patterns to extend the visual intention feedback.